Training Set Properties and Decision-Tree Taggers: A Closer Look
نویسنده
چکیده
This paper examines three ways to improve part-of-speech tagging accuracy: by increasing the number of training examples presented to the tree learner, by increasing the number of word-specific subtrees grown, and by increasing the number of ngrams (preceding parts of speech) per training example. Though experimental results indicate that additional training data generally leads to the greatest amount improved accuracy, they also demonstrate that including word-specific subtrees can be useful and that trees considering two or more previous parts of speech in their classification decision are superior to those examining just one.
منابع مشابه
A closer look at rock physics models and their assisted interpretation in seismic exploration
Subsurface rocks and their fluid content along with their architecture affect reflected seismic waves through variations in their travel time, reflection amplitude, and phase within the field of exploration seismology. The combined effects of these factors make subsurface interpretation by using reflection waves very difficult. Therefore, assistance from other subsurface disciplines is needed i...
متن کاملمطالعات درخت تصمیم در برآورد ریسک ابتلا به سرطان سینه با استفاده از چند شکلیهای تک نوکلوئیدی
Abstract Introduction: Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important ...
متن کاملDetermining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department
Background: Length of stay is one of the most important indicators in assessing hospital performance. A shorter stay can reduce the costs per discharge and shift care from inpatient to less expensive post-acute settings. It can lead to a greater readmission rate, better resource management, and more efficient services. Objective: This study aimed to ident...
متن کاملSteel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm
Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...
متن کاملA Comparison of Three Machine Learning Methods for Amazigh POS Tagging
Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper describes a set of experiments involving the application of three state-of the-art part-of-speech taggers to Amazigh texts, using a tagset of 28 tags. The taggers...
متن کامل